Language modeling by variable length sequences: theoretical formulation and evaluation of multigrams
نویسندگان
چکیده
The multigram model assumes that language can be described as the output of a memoryless source that emits variable-length sequences of words. The estimation of the model parameters can be formulated as a Maximum Likelihood estimation problem from incomplete data. We show that estimates of the model parameters can be computed through an iterative Expectation-Maximization algorithm and we describe a forward-backward procedure for its implementation. We report the results of a systematical evaluation of multi-grams for language modeling on the ATIS database. The objective performance measure is the test set perplexity. Our results show that multigrams outperform conventional n-grams for this task.
منابع مشابه
Multigrams for language identification
In our paper we present two new approaches for language identification. Both of them are based on the use of so-called multigrams, an information theoretic based observation representation. In the first approach we use multigram models for phonotactic modeling of phoneme or codebook sequences. The multigram model can be used to segment the new observation into larger units (e.g. something like ...
متن کاملInference of variable-length linguistic and acoustic units by multigrams
The efficiency of pattern recognition algorithms is highly conditioned to a proper definition of the patterns assumed to structure the data. The multigram model provides a statistical tool to retrieve sequential variable-length regularities within streams of data. In this paper, we present a general formulation of the model, applicable to single or multiple parallel strings of data having eithe...
متن کاملSpeech spectrum representation and coding using multigrams with distance
The multigrams allow us to split a string of symbols into a stream of variable length sequences. The direct application of this method to vector-quantized speech spectra fails, we develop an extension of the method called modiied multi-grams or multigrams with distance. The algorithm for mod-iied multigram dictionary training as well as experimental results are presented. We found a signiicant ...
متن کاملA New Finite Element Formulation for Buckling and Free Vibration Analysis of Timoshenko Beams on Variable Elastic Foundation
In this study, the buckling and free vibration of Timoshenko beams resting on variable elastic foundation analyzed by means of a new finite element formulation. The Winkler model has been applied for elastic foundation. A two-node element with four degrees of freedom is suggested for finite element formulation. Displacement and rotational fields are approximated by cubic and quadratic polynomia...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1995